Movie Recommendation System

In this article, a recommender system is developed that suggests similar movies based on the MovieLens datasets.

Datasets

Preprocessing

Imputing these missing values.

Creating Movie info dataset (for analysis only)

Age Groups

We can create Age Categories using statcan.gc.ca.

Data Analysis

Creating a dataset that we use here for our analysis

Most Watched Movies by Gender

Ratings

Modeling

The next step is to create a matrix that has the user ids on one axis and the movie titles on another. Each cell will then consist of the rating of a movie by a particular user. For this purpose, we use pivot_table function (see this link for more details).

Since users might have not watched all movies (only some movies), some of these values appear as NaN.

Checking out the top ten movies that have a rating greater than 4.0 and have the most rated numbers.

Developing a Recommender Function

In doing so, consider a movie title, for example,

Now let's consider the user ratings for this movie

We can then use corrwith() method to get correlations between two pandas series:

Cleaning the data by removing Nan values and using a DataFrame instead.

However, some of these movies only rated by a few users. In caste that we are only interested in movies that have a least 100 reviews. We have,

Now sort the values and notice how the titles make a lot more sense:

We can come with a similar analysis for any other movies from the list. Therefore, we can summarize the results in the next section.

Conclusions and Movie Recommendations

Based on the analysis, we can create the following function that recommends four similar movies to a movie that we just watched.

The following function recommends N movies similar to the Inp movie.

For example, consider Star Wars. For this movie, the function recommends the following movies